Dog Breed Identification

Practical Deep Learning Workshop 2021- Multi-Class Image Classification

Inbal Biton, Shachar Meretz

For this assigment we choose kaggle dataeset - Dog Breed Identification (https://www.kaggle.com/c/dog-breed-identification) This dataset contains Dogs of different breeds pictures, and in this image classification project we will try to fit the breed to the dog in each picture

Exploratory data analysis of the dataset

b. What data does each sample contain?

Train Set Dimenssions Distribution:

Train Set Dimenssions Distribution:

Histograsma of the label samples:

Number of benchmarks that we found in Kaggle Competitions regarding to our Data set Of Dog-Breed-Identification :

  1. VGGNet 19 : Accuracy: 83% Log Loss: 0.56

  2. Inception V3: Accuracy: 87% Log Loss: 0.47

  3. ResNet50: Accuracy: 90% Log Loss: 0.38

  4. Xception: Accuracy: 89% Log Loss: 0.42

  5. DenseNet: Accuracy: 91% Log Loss: 0.36

  6. SENet: Accuracy: 89% Log Loss: 0.38

  7. ResNext: Accuracy: 93% Log Loss: 0.22

  8. InceptionV4: Accuracy: 94% Log Loss: 0.20

  9. InceptionResnetV2: Accuracy: 95% Log Loss: 0.19

  10. Ensembling InceptionResNetV2, InceptionV4 and ResNext: Accuracy: 96% Log Loss: 0.16

In addition , the suggestion input size is (299-400,299-400).

Values of Log Loss that other models get: (acquired through 5 fold cv, between folds score might very from less than 0.17 to 0.26):

  1. inception_4_300 - 0.228

  2. inception_4_350 - 0.211

  3. inception_4_400 - 0.204

  4. inception_4_450 - 0.223

  5. inceptionresnet_2_300 - 0.239

  6. inceptionresnet_2_350 - 0.217

  7. inceptionresnet_2_400 - 0.215

  8. inceptionresnet_2_450 - 0.222

How our Data Set look like:

Challenges :

as we can see, there are a lot of pictures that we can assume will be challenges for our model. pictures with another animal (like sheep or cat) , pictures with 2 diffrents dogs , picture with people and picture with a lot of background noise(like super marker or nature)

possible missclasify:

AS we can see, there are some dog breeds that are similar to others. we assume that our model can easily missclassify them

easy clasify:

Example for augmantation:

  1. Geometric transformations
  2. flipping
  3. cropping
  4. rotation

Build NN model:

Forming a nueral network and use 5Fold cross validation to measure model performance. this fit model function create 5Fold cross validation and fit the weights models with the data. Moreover, this fit model use for each fold 30 epochs and use callbacks for eralyStopping (if the model val accuracy didn't upgrade fot 10 epochs) and save best weights. at the end , it display the accuracy and loss of each fold.

Fit model:

First model:

The basic model. the Input shape of this model is picture of 375X375X3 this model contain 2 blocks that each block contain convolutaion layer, MaxPool layer and DropOut layer. at the end of those 2 blocks, the model has the flatten layers. The output layer is Dense layer with "softmax" activation function and 120 kernel that represent 120 diffrent dog breeds.

First improvment:

By the result from the first model we see that the accuracy and loss is low and high respectively , and we have overfitting.

As we learned in the lectures there are several ways to improve the model of a neural network:

we learned that we can use convolution layers, add Batch normalizing layers to optimize learning time and reach convergence faster, and in addition we can use drop out and pooling layers that are not learning layers and are not add parameters to the model.

We decided to improve the model by adding convolution layers containing number of filters in ascending order - this will allow us to create more depth to the image - create more features for the Flatten layer and reduce the image dimensions.

we also add a Batch normalization layer at the end of the convolutaion calculate - this layer will improve the learning speed of the model and make the process more stable.

To get better generalize of our model, we decided to increase the drop-out value in order to randomly remove more values and generalize the image, and reduce the dimensions of the input images to 300x300.

Second model:

Second improvment:

By the result from the second model we see that the accuracy and loss is low and high respectively again, but the overfitting increade gradually and not all at once.

In the second improvement we decided to add more convolution layers with a larger number of filters in order to get more depth and more features. In addition we added a layer of Batch Normalization after each layer of convolution and after each layer of Dense - i.e. after each layer performing a particular calculation.

More than that, after the Flatten layer we decided to perform several Dense layers in order to gradually reduce the number of features we will get in each layer and in this way we will have the significant features to perform the classification.

Conclusion of third improvment:

According to the latest results, the model values increased and the model became more accurate and less overfit.

Predictions Samples of the new model:

Result score from Kaggle:

In addition to the two suggestions you applied, implement inference time augmentation and report the improvement in metrics you received.

We Will Change Input size to 100X100

To make the model fit for the new input dimensions, we will remove the last convolution layer with 2048 filters. We will do this to make the augmentation process more efficient and to generalize the images

Conclusion of Augmantation model :

By the result we see that although in this model the accuracy values are lower but also the level of overfitting is significantly smaller. We assume that during augmentation training it takes the model longer to learn and if we increase the amount of epochs for each fold (this can be done since there is no overfitting now) we will get higher accuracy values.

Summary of improvements that helped:

1.Adding convolution layers containing number of filters in ascending order (from 32 to 2048).

2.increase the drop-out value from 0.2 to 0.5.

3.reduce the dimensions of the input images to 300x300.

4.added a layer of Batch Normalization after each layer of convolution and after each layer of Dense.

5.added several Dense layers at the end of the model.

There are other ways that can improve the model that we have not yet considered:

1.reduce the dimensions of the input images less than 300x300.

2.change the MaxPool to GlobalPool

3.increase the amount of epochs for each fold

compare these models:

Select a Trained Model Architecture

Create pre-trained model of Xception architecture:

fit Xception model:

We split the training set into 5 parts , wich one part being a validation set of the training process

After we have performed a model training we will use it to make a prediction for the training set and the validation set so that we can perform a process of features extraction using another deep learning model

For features extraction we decided to use SVM Model and Logistic Regression:

We trained the SVM Model and Logistic Regression on the predication of the training set we get from Xception model - matrix with the size of (number of images X 2048)

After the training we will measure the Accuracy on the predication of the validation set that we get from Xception model - matrix with the size of (number of images X 2048)

Now we will compare these two models

In this comparison it can be seen that with a trained Xception model we will reach higher validation accuracy but compared to linear regression we got a lower accuracy value but in a ratio of 10 percent in less training time and prediction time

Project Conclusion:

In this project we did multi-class image classification with dog breed dataset. This is the first time we have tried to deal with such a big dataset , neural networks and deep learning models in general. Before we started trying to build a nueral network model we did data anlysis on the dataset we got. We saw that we have a relatively small amount of pictures for each dog breed and a large amount of different dog breeds so we concluded that the learning will be long. The first model we built was the basic model containing a small number of layers. We wanted to see the performance of the model, and whether it learns specifically for the training photos. We kept the simplicity of the model so that we did not use augmentation and a lot of convolution layers. In the results of the model we saw that within a small number of moves the model comes to overfiting so we realized that we need to simplify the model. The way we chose to simplify the model was to add convolution layers, add BatchNormalization layers and increase the Dropout value. Finally we ran a model that augments the images. After analyzing the results of this model, we saw that although the model takes longer to reach high accuracy values, it performs controlled and slow learning in both the training data and the validation data. We assume according to the progress of the values in all epochs that if we perform this model on a higher number of epochs we would get higher accuracy values.

In those 3 models, the amount of layers was already large and the model was more deep than the basic. These models were challenging because they took a longer run time and as a result the run take long time and code crashes. We worked on Google Colab and used GPU, and sometimes tried to run the notebook by anaconda on our computers. if we were not limited in terms of space and running time we would increase the running time to a larger amount of epochs and create a model with more layers so that it has more depth. In the second part we had to choose a trained model from those presented in the lectures as well as other models we found online, and we chose the Xception model to perform this part. For this model, we removed the last layer and added another layer that fits the trained model to our classification problem. We used trained weights from ImageNet and adjusted the image size to the input size of the trained model we chose.We trained the model on our dataset and divided the dataset so that 80% of it became the training data while 20% of it became the test data. During the analysis of the results we saw that this model receives much higher values than the models we performed in Part A. It can be understood that the depth and complexity of the model helps it to perform better learning.

After we performed a feature extractor on the trained model. To do this we removed the last layer and used the output we got from the model (output size 2048 features). During the analysis of the data we saw that the amount of features coming out of this model is relatively small compared to other models online and therefore we concluded that during the execution of the feature extractor we will get relatively low values. In addition, we assumed that if we performed a feature extractor on a larger amount of features the model accuracy results would be higher. After extracting the features we wanted to train the model on our dataset to check if feature extractor can be effective. We chose to use 2 models to perform the predict - SVM and Logistics regression. We tested their accuracy values and compared their accuracy values with the accuracy values of the original trained model. We have seen that although the accuracy values are lower but the time it takes to execute is faster and therefore the situation is tradeoff between time and accuracy.

In conclusion, this work was challenging , we learned a lot, and we felt we got the tools and understanding that need for starting with machine learning.